L Number 


Hits 


Search Text 


DB 


Time stamp 


c 
0 


1 


Idap and (xml-schema) and (rnaintain$4 adj session) 


1 iQ or^o\ IQ- 










c.r\J, JrU, 










UtKvVblN 1 , 










IdM_ I Ud 




7 


0 


Idap and {xml adj schema) and (maintain$4 adj session) 


1 ICO AT" 

U5PA 1 


or\ AO /o -1 An-ce 
Z\)\Jol\)fl6\ UU.OD 


8 


1 


(Idap and (xml adj schema) and session) 


us PAT 


2u03/u//o1 0U.O7 


9 


0 


Idap and xml-rpc and session 


U5PA i 


OAAO/AT/O-I AA'C? 
2\)\Jol\j ( 1 6\ \J\J.O( 


10 


5 


Idap and xml-rpc and session 


Uo-PC^PUd, 


OOnO/AT/QI AA'CQ 

ZvKJoiK) ( 1 o\ uu.oy 






EPO, JPO; 










dekwent, 










m ft ji TP\r5 

IBlvl_TDB 




1 \ 


A A 


Idap and xml and rpc and session 


Uo-rorUD, 










EPO; JPO; 










DERWENT; 










IBM TDB 




12 


22 


Idap and xml and rpc and session 


USPAT 


2003/07/31 01:02 



Search History 7/3 1 /03 1 : 1 0: 25 AM Page 1 



L Number 


nllS 


oearcn 1 exi 


Ub 


1 ime siamp 


c 
0 


1 


luap ana ^xmi-scnerna^ ana (nnainiain3>'t aaj session j 


1 IQ pr^Pi IR- 

Uo-ror UD, 


ZUUO/Ur/OI UU.OO 








brU, JrU, 










r\CDVA/CMT' 

UcKWcN 1 , 










IDK/I TOD 

IdIvI_TDB 




7 


0 


Idap and (xnni adj schema) and (maintain$4 adj session) 


US PAT 


2000/0 f 10 \ UU.OD 


0 
0 


1 


(tdap and (xml adj schema) and session) 


1 lOOAT 

UoPAT 


20O0/O//0I UU.Of 


Q 


U 


Idap and xml-rpc and session 


I ICDAT 

UorA I 


ZU\Jo/Of/o\ OU.Of 


1 n 
1 U 


c 
D 


Idap and xml-rpc and session 


I IC Df^Di ID- 

Uo-rorUb, 


Z0uo/0f/o\ uu.oy 








brU, JrU, 










UbKWbN 1 , 










lDlVi_ 1 Ub 




■1 i 


A A 


Idap and xml and rpc and session 


1 IC D/^DI ID- 

Uo-r(jrUD, 


zUUo/tJ/^/ol UI.UZ 








ppn- ipo- 
trvj, JrU, 










DERWENT; 










IBM TDB 




12 


22 


Idap and xml and rpc and session 


USPAT 


2003/07/31 01:18 


13 


35 


Idap and xml and schema 


USPAT 


2003/07/31 01:19 



Search History 7/31/03 3:16:1 1 AM Page 1 



L NumDer 


nils 


Search Text 


Da 


1 ime stamp 


O 




registry and service and Idap interface and session 


1 lODAX 




4 




(xml and registry and service and Idap interface and ((save 


UorAI 








maintain$4 keep$4) adj session)) 




ZUUo/U //o 1 if..^];^*^ 


0 




^xmi ana registry ana service ana laap inierrace ana ((save 


1 lODAT 

UorA 1 






maintain$4 keep$4) adj session))) and @ad<=1 9990610 






6 


20 


(schema and xml and (registry database directory) and 


USPAT 


2003/07/31 21:57 






service and Idap and (interface api rpc) and ((save 










maintain$4 keep$4) adj session)) 







Search History 7/31/03 10:27:33 PM Page 1 
C:\APPS\EAST\workspaces\9646726.wsp 




Try the new Portal design 

Give us your opinion after using it. 



>home > about > feedback : > login 
US Patent & Trademark Office 



Citation 



Symposium on Applied Computing >archive 

Proceedings of the 1999 ACM symposium on Applied computing >toc 
1999 , San Antonio, Texas, United States 

Implementing catalog clearinghouses with XML and XSL 

Author 

Andrew V. Royapp a 
Sponsors 

SIGADA : ACM Special Interest Group on Ada Programming Language 
SIGCUE : ACM Special Interest Group on Computer Uses In Education 
SIGAPP : ACM Special Interest Group on Applied Computing 
SIGBIO : ACM Special Interest Group on Biomedical Computing 

Publisher 

ACM Press New York, NY, USA 

Pages: 616-621 Series-Proceeding-Artide 
Year of Publication: 1999 
ISBN:l-58113-086-4 

g[B^http://doi.acm.org/10. 1 145/298 15 1.298495 (Use this link to Bookmark this page) 

> fiill text > references > citings > index terms > peer to peer 

> Discuss > Similar > Review ifliis Articie Save to Binder 

> BibTex Fonnat 



^ FULL TEXT: Access Rules 

'Epdf754 KB 
^ REFERENCES 

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose 
the complete List rather than only correct and linked references. 

1 Serge Abiteboul . Nicole Bidoit Non first normal form relations: An algebra allowing data restructuring. Journal 
of Computer and System Sciences, v.33 n.3. p.361-393. Dec. 1986 

2 Alfred V. Aho . Ravi Sethi . Jeffrey D. Ullman. Compilers: principles, techniques, and tools. Addison-Weslev 
Longman PubUshing Co.. Inc.. Boston. MA. 1986 



lof4 



7/31/03 11:58 P^ 



3 Berg, D.L., Gonnet, G. H. and Tompa, F. W., The New Oxford English Dictionary Project at the University of 
Waterloo, Technical Report OED-88-0L University of Waterloo Centre for the New Oxford English Dictionary 
(1988). 

4 Blake G.E., Consens M.P., KilpeKiinen, P., Larson, P.- A,, Snider, T., and Tompa, F.W., Text/Relational 
Database Management Systems: Harmonizing SQL and SGML, Proceedings of Applications of Databases 
(ADB-94), Vadstena, 267-280 (1994). 

5 Bray, T., PaoiU, J., and Sperberg-McQueen, C. M., (ed) Extensible Markup Language (XML) LO, World Wide 
Web Consortium Recommendation lO-February-1998 (1988). 

6 Bumard, L., Introducing SARA: an SGML-Aware Retrieval Application for the British National Corpus, Second 
Conference on Teaching and Language Corpora, Lancaster University (1996). 

7 Paolo Cincarini . Alfredo Rizzi . Fabio Vitali. An extensible rendering engine for XML and HTML. Proceedings of 
the seventh international conference on World Wide Web 7. p.225-237, April 1998. Brisbane. Australia 

8 Clark, J. and Deach, S. (ed). Extensible Stylesheet Language (XSL), Version 1.0, World Wide Web Consortium 
Working Draft 18-August-1998 (1998). 

9 Clark, J. XT (Java transformation engine for XSL). http://jclark,com/xml/xt,html (1998). 

10 Codd, E.F., Further Normalization of the Data Base Relational Model, in Data Base Systems, Courant Computer 
Science Symposia Series 6. Englewood Cliffs, NJ, Prentice-Hall (1972). 

1 1 Cover, R, Extensible Markup Language. A very large repository of XML and XSL information, at the Summer 
blstitute of Linguistics. http://wvw.sil-org/sgmllxml.html (1997). 

12 Cyberchefs Home Page, http://www.cyberchefs.com/ (1998). 

13 Cybermeals Home Page, http://www.cybermeals.com/ (1998). 

14 Deutsch, A., Fernandez, M., Florescu, D., Levy, A., and Suciu, D., XML-QL: A Query language for XML, 
Submission to the World Wide Web Consortium 19- August- 1998 (1998). 

15 Goldfarb, C. F., A GeneraUzed Approach to Document Markup, ACM SIGPI-4N Notices 16(6), 68-73 (1981). 

16 Gonnet, G.H., gaeza- Yates, R. A., Snider, T.: Lexicographical Indices for Text: Inverted files vs. PAT trees. 
Technical Report OED-91-0L University of Waterloo Centre for the New Oxford Enghsh Dictionary (1991). 

17 Hafi--, S., Le Maitre, J., Murisasco, E. and V6ronis, J., The MULTEXT SgmlQL Query Language Reference, 
Multilingual Text Tools and Corpora Project, Centre National de la Recherche Scientifique, fittp://www, 
lpl.univaix.fr/projects/SgmlQLl (1997). 

18 ISO 8879: 1986, Information Processing - Text and OfiBce Systems - Standard GeneraUzed Markup Language 
(SGML), hitemational Organization for Standardization. Ref No. ISO 8879:1986 (E). Geneva/New York (1986). 

19 ISO/IEC DIS 10179.2: 1994. Information Technology- Text and OfiBce Systems - Document Style Semantics and 
Specification Language (DSSSL). International Organization for Standardization/ International Electrotechnical 
Commission. Geneva (1994). 

20 Kuikka, E. and Nikunen, E., Systems for Structured Documents (large Ust of software), http ://w w w. cs. uku. 
fi/-ku i kka/S yste ms97/syste ms97. html (1998). 

21 Lie, H. and Bos, B., Cascading Style Sheets, level 1, World Wide Web Consortium Recommendation 



2 of 4 



7/31/03 11:58 P^ 



17-December-1996(1996). 

22 R. W. Matzen . G. E, Hedrick, A new tool for SGML with applications for the World Wide Web. Proceedings of 
the 1998 ACM symposium on Applied Computing, p.752-759. February 27-March 01, 1998. Atlanta, Georgia. 
United States 

23 Darrell R Raymond, Flexible Text EHsplay with Lector. Computer, v.25 n.8. p,49-60, August 1992 

24 The Microsoft XSL Processor, Technology Preview Release, Microsoft Corporation, http://www, 
microsoft.com/xml/xsI/msxsl.asp (1998). 

25 XML Styler, ArborText Inc., http://www.arbortext.com/xmlstyler/index.htm (1998). 



^ CITINGS 



^ INDEX TERMS 

Primary Classification: 
H. Information Systems 

^ H.3 INFORMATION STORAGE AND RETRIEVAL 



General Terms: 

Algorithms. Design. Performance 



Keywords: 

SGML. XML. XSL, e-commerce 



'^ Peer to Peer - Readers of this Article have also read: 

• MBONE: the multicast backbone 
Communications of the ACM 37,8 
Hans Eriksson 

• EflBcient and efFective placement for very large circuits 

Proceedings of the 1993 IEEE/ACM international conference on Computer-aided design 
Wem-Jieh Sun , Carl Sechen 

• The CAVE: audio visual experience automatic virtual environment 
Communications of the ACM 35, 6 

CaroUna Cruz-Neira , Daniel J. Sandin , Thomas A. DeFanti , Robert V. Kenyon , John C. Hart 

• Clustering and linear placement 

Proceedings of the ninth design automation workshop on Design automation 
Donald M. Schuler , Ernst G. Ulrich 

• Editorial pointers 
Communications of the ACM 44, 9 
Diane Crawford 



3 of 4 



7/31/03 11:58 



The ACM Portal is published by the Association for Computing Machinery. Copyright © 2003 ACM, Inc. 



IMPLENENTING CATALOG CLEARINGHOUSES WITH XML AND XSL 



Andrew V, Royappa 
Department of Computer Science 
Miilsaps College 
Jackson, MS 39210 
royapav @ millsaps.edu 



Keywords: XML, XSL. SGML, e-commcrcc 
ABSTRACT 

A catalog clearinghouse is defined as an electronic 
commerce entity that provides a common web-based 
storefiront to a group of merchants. This paper considers the 
implementation of a specific kind of clearinghouse: one 
that hosts catalogs for a large number of similar merchants 
(e.g., thousands of restaurant menus). A standard 
architecture is described, followed by a novel architecture 
that uses the emerging Extensible Marlcup Language 
(XML) and Extensible Stylesheet Language (XSL) 
standards for data storage, search and graphical 
presentation. 

For large clearinghouses, the new architecture realizes 
significant benefits in terms of presentation flexibility, 
powerful search capabilities and increased performance. 
XML and XSL are used at the client side for flexible 
presentations with low maintenance costs, while XML and 
traditional RDBMS techniques are combined at the server 
for searching and business logic. Detailed examples of 
XML document type definitions and XML stnictured data 
are given, along with details on the recursive 
transformation of XML into HTML, using rule-based XSL 
style sheets. 

1. INTRODUCTION 

As c-conunercc on the WWW intensifies, a number of 
meta-businesses have emerged Known by various names 
such as "virtual mail", such businesses provide a single 
access point for groups of merchants. Customers access the 
"catalog clearinghouse" web site to browse catalogs of 
merchant companies that deliver goods and services. The 
clearinghouse handles ordering and payment and transmits 
orders to merchants for fulfillment. This enables customers 
to search for goods, compare prices, etc., at a central 
location instead of visiting multiple dislocated web sites. 

Pcnnission to make digital or hard copies of sU or part of thU work 
for pmonal or claurooin use is ffuded without fee provided thii 
copies are not made or distributed for profit or commercial 
advantage and that copies bear this notice and the fuQ citation oo 
the first page. To copy otherwise; to republish, to post on servcn 
or to redistribute to lists, requres prior specific permissioa and^or a 
fee. 

SAC "99, San AotocuQ, Texas 
1998 ACM 1.58113-086-4/99)0001 S3.00 



The clearinghouse mcxiel is also attractive to small businesses 
for which the benefits of maintaining an individual wd) site may 
never justify the overtiead. 

The model requires little technical expertise on die part of die 
merchant, and little modification to standard business procedure. 
For instance, orders can be forwarded from the clearinghouse 
via fax or telephone. Restaurants fit this category, and it is 
therefore natural that clearinghouses have appealed which 
specialize solely in restaurants offering take-out or home 
delivery services. Among these, Cyberchefi [12] and 
Cybermeals [13] include thousands of restaurants. In this paper, 
a restaurant clearinghouse is used as an illustrative example. 

A look at existing clearinghouses, large and small, reveals two 
broad innplementation strategies: 

1 . The clearinghouse tnaintains a set of customized web pages 
for each merchant 

2. The clearinghouse maintains a database containing catalog 
informadon for each merchant. When a merchant's page is 
accessed, its web pages are generated dynamically. 

For targe clearinghouses that store several thousand catalogs 
(e.g., Cybermeah targets 25.000 restaurants by die end of 1998), 
only die second approach is feasible. Maintaining and updating 
thousands of customized web forms (e.g. to make price and 
product changes) would be prohibitive, even if custom web 
pages were initially created for each business. However, the 
second approach has certain drawbacks: 

1. All catalogs have an identical **look and feel" because 
catalog web pages are programmaucally generated. This makes 
it hard for each merchant to establish an individual sales 
identity. 

2. Catalog search capabilities are limited and inefficient 
Consider die case of a restaurant clearinghouse diat allows each 
restaurant to list die chief ingredients of its dishes, along with 
nutrition information such as calories, fat grams, diabetic 
information, etc A simple keyword search may not accurately 
target ''a Chinese dish widi chicken and snow peas, widiout 
peanuts or MSG. widi at most 25 grams of fat« widun 45 
minutes delivery time of Harvard Square in Boston." 

If such a query is allowed, odier problems arise. In any 
normalized relational database schema [10] constructed to store 



616 



such infonnation, this query will require a costly relational 
join of several tables. A large clearinghouse may expect 
thousands of such queries per nainute at peak periods. 
Finally, while relational databases can easily search for 
expressions containing comparison and Boolean operators, 
they are cumbersome for searches dealing with hierarchical 
relationships between fields sxicb, as "inside/outside", 
"above/below", and transitive co(Tq)ositions of such 
relationships. 

This paper presents an architecture, or implementation 
model, that uses Extensible Markup Language (XML) [S] 
and Extensible Stylesheet Language (XSL) [8] to overcome 
the above problems. These are web-based structured 
document standards that are being developed based on 
Standardized Generalized Markup Language (SGML) 
(15.18]. 

The architeaure presented allows a clearinghouse to have a 
unique presentation format for each merchant's catalog, 
without the overhead of maintaining individual customized 
web pages. By moving formatting tasks to the client 
(browser), the scheme provides load balancing that reduces 
web server workload. Furthermore, expressive and efficient 
catalog searches are made possible. Fmally. the fault 
tolerance that is realized by using a relational database to 
drive the business logic (including transaction processing 
for ordering, payment and delivery) is largely left 
undisturbed. 

The following section contains a brief introduction to XML 
and its companion. XSU After that, die paper describes the 
catalog clearinghouse architecture in detail. Examples of 
XML document type definitions and structure are given, 
along with examples of die recunive transformation of 
XML into ordinary HTML using XSL style sheets. The 
paper concludes with an overview of software tools for 
implementation and avenues for future work. 

2. MARKUP AND STYLESHEET LANGUAGES 

2.1 XML for Structured Document Representation 

Standard Generalized Markup Language (SGML) is a 
document representation standard developed by the 
International Organization for Standardization, published as 
ISO 8879 [18]. It is a meta-language for specifying the 
syntax of domain-spedfic markup languages. HypcrText 
Markup Language (HTML) is one application of SGML 

SGML contains features that are difficult to learn and 
implement, although they are not commonly used in 
practice. Therefore, the World Wide Web Consortium 
tW3C) has constructed a version of SGML suitable for web 
use. namely XML 15). Differences between XML and 
SGML include the following: 

• XML docs not allow exceptions {inclusions and 
exclusions) in element content models. 

• XML does not have the operator that denotes 
elements which are required but can appear in any order. 

• XML tags for elements with non-empty content 



models must come in pairs, e.g. <a>...</a>. Tags with empty 
content models have the special form <a/> (a trailing "r after 
the tag name). 

• XML docs not include abbreviation techniques (tag 
minimization, tag omission). 

These changes simplify XML at die cost of expressive power. 
For instance, exceptions arc a powerfiil expressive feature of 
SGML, but are difficult for both human readers and automatic 
parsers to handle because they introduce ambiguity ([22] 
describes techniques for modeling and understanding exceptions 
and presents a case for dieir inclusion in XML). 

The structure of XML documents can be described using 
Document Type Definitions (DTD), which are specified in 
Extended Backus-Naur Form [2]. A DTD is similar to a context 
free grammar, so standard parsing techniques from compiler 
Uieory apply. XML documents can conform to a DTD. They are 
information rich in the sense dial all semantic text is 
appropriately marked up. By contrast, semantic infonnatioa in 
HTML documents has to be extracted from unstructured 
paragraphs, using natural language processing or heuristic 
techniques based on keyword positioning. 

XML documents generally don't contain presentation 
information. This is stored elsewhere using a system of 
structure-based presentation rules (explained below). The clear 
identification of semantic information via markup elements 
makes it possible to have expressive user interfaces for 
searching, e.g [3], and allows for powerfiil indexing schemes for 
quickly searching very large textual databases, e.g. [3.6i.l6]. 

Both Microsoft and Netscape have aniKsunced diat version 5 of 
their respective browsers will contain XML processors, which 
are software components diat parse XML documents and make 
the resulting tree structures available for manipulation in code. 

22 XSL for Transformation and Rendering of XML 

The W3C has specified an Extensile Stylesheet Language 
(XSL) [8] for presentation and display of XML documents. XSL 
can rearrange documem structure, making it strictly more 
powerful than Cascading Style Sheets [22], whkii can only 
specify how marlcup elements are rendered (font, color etc) 
without allowing element reordering. 

XSL works by recursively transforming XML documents into 
other formau such as HTML, according to a set of style rules 
augmented widi a scripting language. XSL is based in part on a 
styling language for SGML. Document Style Semantics and 
Specification Language (DSSSL) [19]. The tiansfbrmation 
process can take place inside an XML-enabled browser, or in an 
applet or script run by a browser without XML capabilities. 

3. ARCHTfECTURES FOR CLEARINGHOUSES 

Two architectures are described The first is a conventional 
implementation model for clearinghouses, whidi can be applied 
using established web development techniques. The second is a 
new architecture. 



617 



3.1 The Traditional Web Dcvdopment Modd 

The main components in a traditional implementation of a 
catalog clearinghouse will generally include the users' web 
browsers (clients), the dearinghousc web server, a 
RDBMS. a fulfillment agent and the merchants* order 
acceptance points (telephones, fax machines, e-mail 
addresses, etc.). The user must first locate a particular 
catalog by entering keywords into a search form, which is 
posted to the web server. The server formats one or more 
SQL SELECrr queries based on the search terms. The 
queries arc sent to the RDBMS. where their execution will 
typically require relational joins on a number of tables in 
the database. Query results are returned to the web server. 

Using a sequence of such queries, the web server 
determines the number of matdiing catalogs. If zero or 
multiple catalogs match, the user must continue searching. 
Once a single catalog is identified, the web server 
constructs an HTML order form dynamically. This acdon 
can be expensive. For instance, a restaurant menu involves 
many items in multiple categories (appetizers, entries, 
desserts, beverages), lite server must execute nested loops 
to construct an order form, usually embedded inside HTML 
tables. The order form is sent to the user's browser. 

The user completes the order form and submits it to the 
web server. The server constructs atkiitional SQL queries 
(which may now include SELECT, INSERT and 
UPDATE) that enter the order into the ROBMS. At this 
point, a fulfillment agent is responsible for transmitting the 
order to die appropriate m^chant If orcfers must be 
fulfilled in real-time (e.g., restaurants), the web server itself 
may aa as the fulfillment agent, perhaps by relaying the 
order to a fax machine. In other cases (e.g., mail-order 
catalogs) a separate process can periodically exU-act new 
orders from the database for transmission to merchant order 
acceptance points. 

A complete c-commerce implementation involves details 
not included in the above outline, inchiding user session 
management (with HTTP cookies or hidden form fiddsX 
secure payment and transaction processing for fault- 
tolerance. 

3^ A New ArcfaUecture Based on XML and XSL 

Representing catalog data using XML leads to an alternate 
implementation architecture, whidi offers the benefits 
outlined in the introductton. The new architecture extends 
the conventional one, and is ouUtned in Rgure I. Two 
databases are now used, an XML database for catalog 
information, and an RDBMS for order processing. 

The workings of the new architecture are sketched in 
Rgure 2. In step (1), the user submits a search form to the 
web server. Search interfaces for structured text documents 
can be quite powerfiil. For instance, the user can specify 
expressions with Boolean and comparison operators as well 
as hierarchical and containment restrictions, restricting 
expressions to only match within certain markup-element 
contexts. These contextual restrictions are much more 



precise than the word-distance based searches offered by typical 
web search engines. Again, although Boolean and comparison- 
based searches can be done with relational databases, they are 
likely to be less efficient than ones on text databases, which are 
indexed differently. 

The completed search form is sent to the web server (2). The 
web server formulates a series of queries to the XML database 
(3), where a search occurs (4). eventually locating a single 
catalog (as before, the user may need to be consulted again if a 
unique catalog is not located). The catalog has XML and XSL 
components. These are returned to the web server (5), which 
does only minimal processing (6). e.g. to add user session 
information. The data is sent to the dient browser (7), which 
transforms the XML into HTML by applying the XSL rule set 
(8). An example of this transformation is ^own in secdon 33. 

The browser now renders the catalog order form for the user. At 
this point, the process continues much as in the traditional 
model, using an RDBMS and a fulfillment agent (not shown), h 
is in^x>rtant to continue to use an RDBMS here because foult- 
tolcrance is especially crucial once an order is placed. 
Transaction processing systems have dear model of failure 
events arvl a well-establistel track record of fault-tolerance. 

In the new architecture, the fulfillment agem may query both the 
XML database and die SQL database for additional infomuition 
(e.g. merchant address). This is because die order information 
entered into die RDBMS will only store catalog and item 
identifiers, to avoid duplicating data diat is in die XML database 
(because duplicating information would violate general d at a b ase 
design principles). 

The benefits of die new implementation can now be reviewed 
First, die web server's load is significandy reduced because 
formatting is done in die browser. Next, since each XML catalog 
may have a different XSL rule set, catalogs are rendered 
uniquely. In addition, powerfid search techniques drawing on a 
long history of SGML research can be applied to perform 
searches that are easier to express and more efficient than is 
possible widi relational data. For instance. XML-QL [14] allows 
for the specification of queries widi conditions, structured text 
"joins'*, padi regular expressions to handle hierarchy, nested 
queries, etc. Because XML-QL itself uses a syntax similar to 
diat of XML documents, it is easier to express structured text 
qt^es in a language like XML-QL than in SQL. 

Finally, it should be noted diat die architecture can be modified 
to effect die XML to HTML transformation on die server, using 
transformation software such as XT [9]. While placing an 
additional load on the server, this solution can generate browser- 
independent HTML. This would aUow developers to deploy 
solutions based on dus architecture without waiting for 
XMUXSL-enabled browsers to become commonplace. 

3J Sample XML and XSL Representations 

In diis section, sample XML and XSL data for a ''restaurant 
clearinghouse" are shown. These are simplified for clarity and 
brevity, e.g.. die restaurant DTD shown only stores one 
restaurant menu instead f many. The DTD for a restaurant 
includes its address and its menu. The menu has sections for 
appetizers, entries, dcss^ and beveragies. 



618 




Figure 1: Clearinghouse Architecture Outline 



XML queries 
& 

results 




Figure 2: Clearinghouse Architecture Detail 



Each section contains one or niore menu items. A 
rcxiaurani must offer ai least one entrfe, but may offer zero 
or more items in each of the other categories. A menu item 
consists of a description, a price and a "low fat** flag. An 



XML DTD for these conditions is shown in Figure 3. The EBNF 
notation uses a combination of common regular expression 
sequencing operators and context-free grammar productions. For 
instance, the operators and denote repetitions of 



619 



"zero or morc% "one or more" and "zero or one" times 
respectively; the "I" operator denotes "or" (alternation) and 
the operator denotes sequencing. "#PCDATA" denotes 
(essentially) XML text 

Figure 4 shows fragments of a data file conforming to the 
OTD. Note that all tags are properly nested and appear in 
pairs, except the '*<lowfat/>'* tag which has a content tnodel 
of EMPTY in the DTD. and hence can appear alone with 
the trailing '*r. The file is indented for clarity, but in 
practice is stored according to XML rules for processing 
whitespace. 

Figure S shows a minimal XSL style sheet for transforming 
the above XML data into ordinary HTML. The style sheet 
specifies four transformation rules. Each transformation 
rule contains a pattern and an action. The XML document's 
natural parse tree structure is traversed to find nodes that 
match the pattern part of a rule. At matching nodes, the 
action pan is used to derive a transformed sub-tree, which 
is attached at the current node (replacing the one already 
there). This process continues recursively until ru) patterns 
match. Therefore, transformations can be specified 
compactly without resorting to looping constructs. 
Funhemwre. the W3C design goals state thai XSL should 
leverage the power of scripting languages. This means that 
progranuners more comfortable with a procedural paradigm 
will be able to use loops to effect the above transformation. 
Scripting languages that make tree data available to the 
programmer for use in calculations, decisions and loops can 
perform arbitrary restructuring. 

The style sheet in Figure 5 is hand-coded according to the 
current W3C working draft for XSL [8], and tested using 
XT [9], a Java-based transformation engine for XSL. 
Earlier versions were tested in Microsoft's Technology 
Preview XSL processor 1241, an ActiveX control. The style 
sheet transforms a restaurant XML document into a simple 
HTML file where all menu items (firom all categories 
joined together) are listed as an HTML table. HTML tags 
embedded in the XSL rules are shown in uppercase for 
clarity. The style sheet in Figure 5 contains a total of four 
rules. 

While XSL style sheets are not very amenable to hand 
coding, the abundance of graphical devdopmcnt tools 
indicates that style sheet generators will soon be routinely 
available (some already exist, e.g. [25]). Once a style sheet 
is created, a transfbrroalion engine such as )CT [9] 
recursively applies the XSL rules to transform the sample 
XML document into HTML as shown in Rgure 6. 

4. IMPLEMENTATION APPROACHES AND 
FUTURE WORK 

The XML-based clearinghouse architecture is designed to 
be implemented efficiently using primarily off-the-shelf 
components. Available components range from inexpensive 
choices for prototyping, to more expensive choices for 
deployment Many XML and XSL resources are listed in 
(111 (an excellent starting point for learning XML). Many 
software tools for structured docimients are also listed at 



[20]; space limits mention here to a few. 

Web server activities can take place in-process or out-of- 
process. Within the server process, server-side scripting and 
dynamically linked libraries (i.e., server Application Program 
Interfaces) provide choices between simplicity (scripting) and 
performance (APIs). Alternately, the server can initiate 
additional worker processes using the older Conunon Gateway 
Interface (CGI/bin) techniques. Server-side scripting is a good 
compromise between efficiency and complexity. and\ is 
supported by leaxiing web servers, including Apache (PHP/FI 
scripting), Microsoft {VBScript/JScript) and Netscape 
{Javascript). 

RDBMS choices range from simply placing the database on the 
web server's host, where the server accesses them direcdy using 
the appropriate libraries (e.g. ODBC compliant drivers), to 
multi-tiered solutions involving separate servers. A separate 
RDBMS server is preferable for many reasons. Choices range 
from the inexpensive MySQL to high-end enterprise servers 
fi^m Oracle, IBM, Microsoft etc.. which support transaction 
processing. A wide variety of e-conunerce code is available to 
implement ordering and payment. 

Tools for XML and XSL are rapidly appearing, based on 
extensive SGML development Search engines like PAT [16] 
have shown that high performance is achievable on very large 
databases. A public domain choice is SARA [6]. These search 
engines have been extensively used with enormous textual 
databases and offer the advanced search features described 
eariier. 

Intriguing techniques blending relational and structured-text 
theory are emerging, which should lead to additional choices for 
XML databases. In addition, new query languages have 
appeared, e.g. SGML-QL [17] and XML-QL [14). Search 
engines for these should foUow. Also, XML processors in Java 
and other languages [1 1] are suitable for developing prototype 
search engines (cuireiUly, Java lun-time performance may be 
inadequate for high-vohime production use at the server side). 

XSL software suitable for prototyping and development (but not 
for deployment, yet) is available. These include XML Styler 
[25] for XML-based style sheet generation and the XT processor 
[9] for XSL transformations. Other Java-based XSL tools [11] 
and SGML styling tools [20] are available as wdl. 

For this paper, work is under way on a prototype 
implemeruation using some of the tools mentioned above, and 
on additional design issues. The latter include enabling 
merchants to update their data and presentation online (probably 
using Java), developing flexible web search interfaces based on 
current SGML interfaces or Java displets [7], tracking customer 
information, etc. 

Additional work is also needed on applying database validation 
rules and type checking to XML's textual data, and efliciendy 
performing SQL-like aggregate queries on XML data for reports 
and data mining. Thwc problems will probably involve the 
application of techniques integrating relational and structured 
data or relational variants [1,4]. 



620 



Acknowledgements 

The author thanks the anonymous referees for insighifiil 
conunents on this paper. 

5. REFERENCES 

[l]AbitebouL S. and Bidoit, N.. Non first normal form 
relations: An algebra allowing data restruciuring. Journal 
of Computer and System Sciences, 33(3), 361-393 (1986). 
(21Aho, A. v.. Sethi, R. and UUman. J.a. Compilers: 
Principles^ Techniques, and Toob, Reading:. Addison- 
Wesley, 1986, rpt. coir. (1988). 

[3J Berg, D.L-. Connct, G. H. and Tompa, R W., The New 
Oxford English Dictionary Project ai the University of 
Waterloo, Technical Report OEI>-88-01. University of 
Waierloo Centre for the New Oxford English Dictionary 
(1988). 

[4] Blake G.E, Conscns M.P.. KilpelUinen, P., Larson. P.- 
A., Snider, T.. and Tompa. RW., Text/Relational Database 
Management Systems: Harmonizing SQL and SGML, 
Proceedings of Applications of Databases (ADB'94}. 
Vadsiena, 267-280(1994). 

[5] Bray, T.. Paoili, J., and Sperberg-Mcijueen. C. M., (ed) 
Extensible Markup Language (XML) LO. World Wide 
Web Consortium Recommendation lO-February-1998 
(1988). 

[6] Bumard, L. Introducing SARA: an SGML- A ware 
Retrieval Application for the British National Corpus. 
Second Cortference on Teaching and Language Corpora, 
Lancaster University (1996). 

[7] Ciancarini, P., Rizzi, A. and Viiali, R. An extensible 
rendering engine for XML and HTML, Proceedings of the 
Seventh international WWW Conference, Brisbane. Also in 
Computer Networks and fSDN Systems, 30, issues 1-7 
(1998). 

(81 Clark. J. and Deach» S. (ed). Extensible Stylesheet 
Language (XSL), Version l.O, World Wide Web 
Consortium Woricing Draft 18-August-1998 (1998). 
[9] Clark, J. XT (Java transformation engine for XSL). 
hap://jclark.com/xml/xthtml (1998). 
(lOlCodd. E.R, Further Normalization of the Data Base 
Relational Model. In Data Base Systems, Courant 
Computer Science Symposia Series 6. Englewood Cliffs, 
NJ. Prentice-Hall (1972). 

1 1 1 1 Cover. R., Extensible Markup Language. A very large 
repositoiy of XML and XSL information* at the Summer 
Institute of Linguistics. http7/www.sil.org/sgml/xml.html 

(1997) . 

I12]Cyberchefs Home Page. http*7Avww.cybcrchcfs.cona^ 

(1998) . 

(I31Cybcnneals Home Page, http://www.cybermcals.com/ 
(1998). 

[14] Deutsch , A.. Fernandez . M.. Rorcscu. D., Levy. A., 
and Suciu, D.. XML-QL: A (Juery language for XML. 
Submission to the World Wide Web Consortium 19- 
Augusi- 1998 (1998). 

(151CoIdfarb, C R. A Generalized Approach to 
Document Markup, ACM SIGPIAN Notices 16(6). 68-73 
(1981). 

[l6)C}onnei. G.H.. Bacza- Yates. R. A.. Snider. T.: 



Lexicographical Indices for Text: Inverted files vs. PAT trees, 
Technical Report OED-9I-01. University of Waterloo Centre for 
the New Oxford English Dictionary (1991). 
[17] Hari^ , S., Le Maitre , J., Murisasco, E and V^ronis , J., 
The MULTEXT SgmlQL Query Language Reference, 
Multilingual Text Tools and Corpora Project, Centre National de 
la Recherche Sctentifique, fittpV/www.Ipl.univ- 
aix.fr/projects/SgmlQL/ (1997). 

[18] ISO 8879:1986. Information Processing - Text and Office 
Systems - Standard Generalized Markup Language (SGML). 
International Organization for Standardization. Ref. No. ISO 
8879:1986 (E). Geneva/New York (1986). 
[l9]fS0/lEC DIS 10179.2:1994. Information Technology - 
Text and Office Systems • Document Style Semamics and 
Specification Language (DSSSL). International Organization for 
Standardization/ International Electrotechnical Commission. 
Geneva (1994), 

[20] Kuikka, E and Nikunen. E, Systems for Structured 

Documents (large list of software), 

http://www.es. uku.fi/-kuikka/Systems97/systems;97.html 

(1998). 

(211 Lie, H. and Bos, B.. Cascading Style Sheets, level 1, World 
Wide Web Consortium Recommendation I7-Decembcr-1996 
(1996). 

[22] Matzen, RW. and Hedrick. G.E. A New Tool for SGML 
with Applications for the World Wide Web, Proceedings of the 
1998 ACM/SiGAPP Symposium on Applied Computing, 752-759 
(1998). 

[23] Raymond. D. R.. Flexible Text Display with Lector, 

Computer, August. 49-60 (1992). 

[24] The Microsoft XSL Processor, Technology Preview 

Release, Microsoft Corporation, 

http://www.microsoftcoiiVxml/xsl/msxsLasp ( 1998). 

[251 ^ML Sryler, ArborText Inc., 

http;//www.arbortexi.com/xmlstyler/index.htm(1998). 



621 



<?xinl version="l .0" ?> 




<!DOCTYPE 


restauranc 


[ 


<! ELEMENT 


restaurant 


{name, address, menu)> 


< .'ELEMENT 


name 


(tPCDATA)> 


<! ELEMENT 


address 


{street, city, (state | region), zip, phone) > 


<! ELEMENT 


menu 


(appetizers, entrees, desserts, beverages) > 


< [ELEMENT 


street 


( # PCDATA) > 


<!ELEMENT 


city 


(# PCDATA) > 


<! ELEMENT 


state 


(#PCDATA) > 


< ! ELEMENT 


region 


{ *PCDATA) > 


< ! KJif^f^'V 


zip 


{#PCDATA)> 


< [ELEMENT 


phone 


(ffPCDATA) > 


< (ELEMENT 


appetizers 


{item*)> 


< (ELEMENT 


entrees 


(item+) > 


< (ELEMENT 


desserts 


(item*)> 


< (ELEMENT 


beverages 


(item*)> 


< (ELEMENT 


item 


(description, price, lowfat?) > 


< (ELEMENT 


description 


(#PCDATA)> 


< ( ELEMENT 


price 


( #PCDATA) > 


< ! ELEMENT 
1> 


lowfat 


EMPTY> 



Rgure 3: Sample XML DTD 



<restaurant> 

<name>Toad Hall Cafe</name> 
<address> 

<street>116 Old Canton Road</street> 

<city>Boston</city> 

{ ... state, zip and phone ... ) 
</address> 
<menu> 

<appetizers> 
<item> 

<description>cheese f ritters</description> 

<price>4 . 99</price> 
</item> 
<item> 

<description>shrimp platter</description> 
<price>5 . 49</price> 
< lowfat /> 
</icem> 
</appetizers> 
<entrees> 
<iteci> 

<de3cription>grilled duck breast</description> 
<price>12 . 75</price> 
<lowfat/> 
</itCTi> 
</entrees> 

( . , . desserts and beverages . . - ) 
</menu> 
</re3taurant> 



Figure 4: Sample XML Document 



622 



<xsl: stylesheet result-ns="htira "> 

<xsl : template match="/'> 

<HTML> 

<HEAD> 

<TITLE> 

<xsl rprocess select=* restaurant/name •/> 

</TITLE> 

</HEAD> 

<BODY> 

<H1> 

Menu for 

<xsl :process select= * restaurant /name " / > 

</Hl> 

<HR/> 

<BR/> 

<TABLE> 

<xsl rprocess select= " restaurant /menu •/> 

</TABLE> 

</BODY> 

</HTML> 

< /xs 1 ; tenipla te> 

<xsl : template match="item''> 
<TR> 

<x3l : process -chi Idren / > 
</TR> 

</xsl : teinplate> 

<xsl: template ma tch= "description" > 
<TD> 

<X3 1 : process -chi Idren / > 
</TD> 

< /xsl : template> 

<xsl : template ma tch= "price* > 

<TD> 

<B> 

<xsl : process -chi Idren /> 

</B> 

</TD> 

</xsl: template> 

</X3l : stylesheet > 

Rgure 5: XSL Style Sheet 



<:HEAD> 
<TITLE> 

Toad Hall Cafe 

</TITLE> 

</HEAD> 

<BODY> 

<H1> 

Menu for Toad Hall Cafe 

</Hl> 

<HR/> 

<BR/> 

<TABLE> 

<TR> 

<TD> 

cheese fritters 

</TD> 

<TD> 

<B> 

4.99 

</B> 

</TD> 

</TR> 

<TR> 

<TD> 

shrimp platter 

</TD> 

<TD> 

<B> 

5.49 

</B> 

</TD> 

</TR> 

<TR> 

<TD> 

grilled duck breast 

</TD> 

<TD> 

<B> 

12.75 

</B> 

</TD> 

</TR> 

</TABLE> 

</BODy> 

</HTML> 

Figure 6: XML Transformed to HTML 



623 



